Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Quantization of Neural Networks

 

FIGURE 2.10

Overview of the proposed Q-DETR framework. We introduce the distribution rectiﬁcation

distillation method (DRD) to reﬁne the performance of Q-DETR. From left to right, we

respectively show the detailed decoder architecture of Q-DETR and the learning framework

of Q-DETR. The Q-Backbone, Q-Encoder, and Q-Decoder denote quantized architectures,

respectively.

inaccurate object localization. Therefore, a more generic method for DETR quantization is

necessary.

To tackle the issue above, we propose an eﬃcient low-bit quantized DETR (Q-

DETR) [257] by rectifying the query information of the quantized DETR as that of the

real-valued counterpart. Figure 2.10 provides an overview of our Q-DETR, mainly accom-

plished by a distribution rectiﬁcation knowledge distillation method (DRD). We ﬁnd ineﬀec-

tive knowledge transferring from the real-valued teacher to the quantized student primarily

because of the information gap and distortion. Therefore, we formulate our DRD as a bi-level

optimization framework established on the information bottleneck principle (IB). Generally,

it includes an inner-level optimization to maximize the self-information entropy of student

queries and an upper-level optimization to minimize the conditional information entropy

between student and teacher queries. At the inner level, we conduct a distribution alignment

for the query guided by its Gaussian-alike distribution, as shown in Fig. 2.8, leading to an

explicit state in compliance with its maximum information entropy in the forward propaga-

tion. At the upper level, we introduce a new foreground-aware query matching that ﬁlters

out low-qualiﬁed student queries for exact one-to-one query matching between student and

teacher, providing valuable knowledge gradients to push minimum conditional information

entropy in the backward propagation.

2.4.1

Quantized DETR Baseline

We ﬁrst construct a baseline to study the low-bit DETR since no relevant work has been

proposed. To this end, we follow LSQ+ [13] to introduce a general framework of asymmetric

activation quantization and symmetric weight quantization:

xq =⌊clip{⁽^x⁻^z⁾

αx

, Q^x

n^{, Q}^x

p^}⌉^,^wq ⁼^⌊^clip^{^w

αw ^{, Q}^w

n ^{, Q}^w

p ^}⌉^,

Qa(x) = αx ◦xq + z,

Qw(x) = αw ◦wq,

(2.24)

where clip{y, r1, r2} clips the input y with value bounds r1 and r2; the ⌊y⌉rounds y to

its nearest integer; the ◦denotes the channel-wise multiplication. And Q^x

n ⁼⁻²^a⁻¹^{, Q}^x

p ⁼